Distributed Entity Disambiguation with Per-Mention Learning

نویسندگان

  • Tiep Mai
  • Bichen Shi
  • Patrick K. Nicholson
  • Deepak Ajwani
  • Alessandra Sala
چکیده

Entity disambiguation, or mapping a phrase to its canonical representation in a knowledge base, is a fundamental step in many natural language processing applications. Existing techniques based on global ranking models fail to capture the individual peculiarities of the words and hence, either struggle to meet the accuracy requirements of many real-world applications or they are too complex to satisfy real-time constraints of applications. In this paper, we propose a new disambiguation system that learns specialized features and models for disambiguating each ambiguous phrase in the English language. To train and validate the hundreds of thousands of learning models for this purpose, we use a Wikipedia hyperlink dataset with more than 170 million labelled annotations. We provide an extensive experimental evaluation to show that the accuracy of our approach compares favourably with respect to many state-of-the-art disambiguation systems. The training required for our approach can be easily distributed over a cluster. Furthermore, updating our system for new entities or calibrating it for special ones is a computationally fast process, that does not affect the disambiguation of the other entities.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Joint Entity Disambiguation with Local Neural Attention

We propose a novel deep learning model for joint document-level entity disambiguation, which leverages learned neural representations. Key components are entity embeddings, a neural attention mechanism over local context windows, and a differentiable joint inference stage for disambiguation. Our approach thereby combines benefits of deep learning with more traditional approaches such as graphic...

متن کامل

Robust Named Entity Disambiguation with Random Walks

Named Entity Disambiguation is the task of assigning entities from a Knowledge Graph (KG) to mentions of such entities in a textual document. The state-of-the-art for this task balances two disparate sources of similarity: lexical, defined as the pairwise similarity between mentions in the text and names of entities in the KG; and semantic, defined through some graph-theoretic property of a sub...

متن کامل

To Link or Not to Link? A Study on End-to-End Tweet Entity Linking

Information extraction from microblog posts is an important task, as today microblogs capture an unprecedented amount of information and provide a view into the pulse of the world. As the core component of information extraction, we consider the task of Twitter entity linking in this paper. In the current entity linking literature, mention detection and entity disambiguation are frequently cast...

متن کامل

Modeling Mention, Context and Entity with Neural Networks for Entity Disambiguation

Given a query consisting of a mention (name string) and a background document, entity disambiguation calls for linking the mention to an entity from reference knowledge base like Wikipedia. Existing studies typically use hand-crafted features to represent mention, context and entity, which is laborintensive and weak to discover explanatory factors of data. In this paper, we address this problem...

متن کامل

Bridge Text and Knowledge by Learning Multi-Prototype Entity Mention Embedding

Integrating text and knowledge into a unified semantic space has attracted significant research interests recently. However, the ambiguity in the common space remains a challenge, namely that the same mention phrase usually refers to various entities. In this paper, to deal with the ambiguity of entity mentions, we propose a novel Multi-Prototype Mention Embedding model, which learns multiple s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1604.05875  شماره 

صفحات  -

تاریخ انتشار 2016